Bilingual Example Segmentation based on Markers Hypothesis

نویسندگان

  • Alberto Simões
  • José João Almeida
چکیده

The Marker Hypothesis was first defined by Thomas Green in 1979. It is a psycho-linguistic hypothesis defining that there is a set of words in every language that marks boundaries of phrases in a sentence. While it remains a hypothesis because nobody has proved it, tests have shows that results are comparable to basic shallow parsers with higher efficiency. The chunking algorithm based on the Marker Hypothesis is simple, fast and almost language independent. It depends on a list of closed-class words, that are already available for most languages. This makes it suitable for bilingual chunking (there is not the requirement for separate language shallow parsers). This paper discusses the use of the Marker Hypothesis combined with Probabilistic Translation Dictionaries for examplebased machine translation resources extraction from parallel corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Structural-Based Approach to Cantonese-English Machine Translation

In this paper, we present an integrated method to machine translation from Cantonese to English text. Our method combines example-based and rule-based methods that rely solely on example translations kept in a small Example Base (EB). One of the bottlenecks in example-based Machine Translation (MT) is a lack of knowledge or redundant knowledge in its bilingual knowledge base. In our method, a f...

متن کامل

An Investigation of the Effect of Bilingual Education on Language Achievement of Iranian Pre-intermediate EFL Learners

The present study investigated the impact of bilingual education on language achievement of Iranian Pre-intermediate EFL learners. It actually used bilingual education through content- based methodology or subject matter such as math, science and reading. To this purpose, the researchers used 40 Pre-intermediate EFL participants who were studying English conversation at a private language insti...

متن کامل

An Investigation of the Effect of Bilingual Education on Language Achievement of Iranian Pre-intermediate EFL Learners

The present study investigated the impact of bilingual education on language achievement of Iranian Pre-intermediate EFL learners. It actually used bilingual education through content- based methodology or subject matter such as math, science and reading. To this purpose, the researchers used 40 Pre-intermediate EFL participants who were studying English conversation at a private language insti...

متن کامل

Pre-processing of Bilingual Corpora for Mandarin-English EBMT

Pre-processing of bilingual corpora plays an important role in Example-Based Machine Translation (EBMT) and Statistical-Based Machine Translation (SBMT). For our Mandarin-English EBMT system, pre-processing includes segmentation for Mandarin, bracketing for English and building a statistical dictionary from the corpora. We used the Mandarin segmenter from the Linguistic Data Consortium (LDC). I...

متن کامل

Example-based Segmentation of Swedish Compounds in a Swedish–English bilingual corpus and the possibility of Evaluating Compound Links based on that Segmentation

In this paper an algorithm for segmenting Swedish compounds in a linking material is presented. The algorithm does the segmentation by looking at the example set by the corresponding English compound. The idea that this kind of segmentation can be used to evaluate the link between the two compounds is also tested. This would be possible because links where the algorithm cannot find suitable Swe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009